On-Line Labeled Topic Model

نویسندگان

  • YongHeng Chen
  • Yaojin Lin
  • Hao Yue
چکیده

A large number of electronic documents are labeled using human-interpretable annotations. High-efficiency text mining on such data set requires generative model that can flexibly comprehend the significance of observed labels while simultaneously uncovering topics within unlabeled documents. This paper presents a novel and generalized on-line labeled topic model (OLT) tracking the time development of extracted topics through a structured multi-labeled data set. Our topic model has an incrementally updated principle based on time slices in an on-line fashion, and can detect dynamic trending for labeled topics in parallel. Empirical results are presented to demonstrate lower perplexity and high performance of our proposed model when compared with other models.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Dirichlet Process with Mixed Random Measures: A Nonparametric Topic Model for Labeled Data

We describe a nonparametric topic model for labeled data. The model uses a mixture of random measures (MRM) as a base distribution of the Dirichlet process (DP) of the HDP framework, so we call it the DPMRM. To model labeled data, we define a DP distributed random measure for each label, and the resulting model generates an unbounded number of topics for each label. We apply DP-MRM on single-la...

متن کامل

The Polylingual Labeled Topic Model

In this paper, we present the Polylingual Labeled Topic Model, a model which combines the characteristics of the existing Polylingual Topic Model and Labeled LDA. The model accounts for multiple languages with separate topic distributions for each language while restricting the permitted topics of a document to a set of predefined labels. We explore the properties of the model in a two-language...

متن کامل

Learning Hybrid Models for Image Annotation with Partially Labeled Data

Extensive labeled data for image annotation systems, which learn to assign class labels to image regions, is difficult to obtain. We explore a hybrid model framework for utilizing partially labeled data that integrates a generative topic model for image appearance with discriminative label prediction. We propose three alternative formulations for imposing a spatial smoothness prior on the image...

متن کامل

Labeled LDA: A supervised topic model for credit attribution in multi-labeled corpora

A significant portion of the world’s text is tagged by readers on social bookmarking websites. Credit attribution is an inherent problem in these corpora because most pages have multiple tags, but the tags do not always apply with equal specificity across the whole document. Solving the credit attribution problem requires associating each word in a document with the most appropriate tags and vi...

متن کامل

یک مدل موضوعی احتمالاتی مبتنی بر روابط محلّی واژگان در پنجره‌های هم‌پوشان

A probabilistic topic model assumes that documents are generated through a process involving topics and then tries to reverse this process, given the documents and extract topics. A topic is usually assumed to be a distribution over words. LDA is one of the first and most popular topic models introduced so far. In the document generation process assumed by LDA, each document is a distribution o...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016